CUD: Crowdsourcing for URL Spam Detection
ثبت نشده
چکیده
The prevalence of spam URLs in Internet services, such as email, social networks, blogs and online forums has become a serious problem. These spam URLs host spam advertisements, phishing attempts, and malwares, which are harmful for normal users. Existing URL blacklist approaches offer limited protection. Although recent machine learning based URL classification approaches demonstrate good accuracy and reasonable throughput, they are based on observations from existing spam URLs and hard to detect new spam URLs when attackers employ new strategies. In this paper, we present CUD (Crowdsourcing for URL spam detection) as a supplement of existing detection tools. CUD leverages human intelligence for URL classification through crowdsourcing. CUD crawls existing user comments about spamURLs already on the Internet, and employs sentiment analysis from nature language processing to analyze the user comments automatically for detecting spam URLs. Since CUD does not using features directly associated with the URLs and their landing pages, it is more robust when attackers change their strategies. Through evaluation, we find up to 70% of URLs have user comments online. CUD achieves an accuracy of 86.8% in terms of true positive rate with a false positive rate 0.9%. Moreover, about 75% of spam URLs CUD detects are missed by other approaches. Therefore, CUD can be used as a good complement to other approaches.
منابع مشابه
Feature-based Malicious URL and Attack Type Detection Using Multi-class Classification
Nowadays, malicious URLs are the common threat to the businesses, social networks, net-banking etc. Existing approaches have focused on binary detection i.e. either the URL is malicious or benign. Very few literature is found which focused on the detection of malicious URLs and their attack types. Hence, it becomes necessary to know the attack type and adopt an effective countermeasure. This pa...
متن کاملExploration of gaps in Bitly's spam detection and relevant counter measures
Existence of spam URLs over emails and Online Social Media (OSM) has become a growing phenomenon. To counter the dissemination issues associated with long complex URLs in emails and character limit imposed on various OSM (like Twitter), the concept of URL shortening gained a lot of traction. URL shorteners take as input a long URL and give a short URL with the same landing page in return. With ...
متن کاملExploration of gaps in Bitly ’ s spam detection and relevant counter measures Student
Existence of spam URLs over emails and Online Social Media (OSM) has become a growing phenomenon. To counter the dissemination issues associated with long complex URLs in emails and character limit imposed on various OSM (like Twitter), the concept of URL shortening gained a lot of traction. URL shorteners take as input a long URL and give a short URL with the same landing page in return. With ...
متن کاملMSRC at TREC 2011 Crowdsourcing Track
Crowdsourcing useful data, such as reliable relevance labels for pairs of topics and documents, requires a multidisciplinary approach that spans aspects such as user interface and interaction design, incentives, crowd engagement and management, and spam detection and filtering. Research has shown that the design of a crowdsourcing task can significantly impact the quality of the obtained data, ...
متن کاملAn Effective Model for SMS Spam Detection Using Content-based Features and Averaged Neural Network
In recent years, there has been considerable interest among people to use short message service (SMS) as one of the essential and straightforward communications services on mobile devices. The increased popularity of this service also increased the number of mobile devices attacks such as SMS spam messages. SMS spam messages constitute a real problem to mobile subscribers; this worries telecomm...
متن کامل